Error Propagation for Approximate Policy and Value Iteration
نویسندگان
چکیده
We address the question of how the approximation error/Bellman residual at each iteration of the Approximate Policy/Value Iteration algorithms influences the quality of the resulted policy. We quantify the performance loss as the Lp norm of the approximation error/Bellman residual at each iteration. Moreover, we show that the performance loss depends on the expectation of the squared Radon-Nikodym derivative of a certain distribution rather than its supremum – as opposed to what has been suggested by the previous results. Also our results indicate that the contribution of the approximation/Bellman error to the performance loss is more prominent in the later iterations of API/AVI, and the effect of an error term in the earlier iterations decays exponentially fast.
منابع مشابه
Approximate Modified Policy Iteration
Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are exten...
متن کاملApproximate modified policy iteration and its application to the game of Tetris
Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are exten...
متن کاملApproximate Dynamic Programming for Two-Player Zero-Sum Markov Games
This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in Lp-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteratio,n). We show that we ca...
متن کاملTight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies
We consider approximate dynamic programming for the infinite-horizon stationary γ-discounted optimal control problem formalized by Markov Decision Processes. While in the exact case it is known that there always exists an optimal policy that is stationary, we show that when using value function approximation, looking for a non-stationary policy may lead to a better performance guarantee. We def...
متن کاملError Bounds for Approximate Policy Iteration
In Dynamic Programming, convergence of algorithms such as Value Iteration or Policy Iteration results -in discounted problemsfrom a contraction property of the back-up operator, guaranteeing convergence to its fixedpoint. When approximation is considered, known results in Approximate Policy Iteration provide bounds on the closeness to optimality of the approximate value function obtained by suc...
متن کامل